Reduct and Variance Based Clustering of High Dimensional Dataset

نویسندگان

  • Dharmveer Singh Rajput
  • Pramod Kumar Singh
  • Mahua Bhattacharya
چکیده

In high dimensional data, general performance of the traditional clustering algorithms decreases. As some dimensions are likely to be irrelevant or contain noisy data and randomly selected initial centre of the clusters converge the clustering to local minima. In this paper, we propose a framework for clustering high dimensional data with attribute subset selection and efficient cluster centre initialization. It uses rough set theory to determine the relevant attributes (dimensions) in first phase. In second phase, maximum variance dimension is used to determine the optimal initial centres of the clusters. The kmeans clustering algorithm is applied with these initial cluster centres, in phase three, to find optimal clustering of data set. It improves efficiency of the clustering process tremendously and our experiment on test data set shows that accuracy of the results has improved considerably.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A New Hybrid K-Mean-Quick Reduct Algorithm for Gene Selection

Feature selection is a process to select features which are more informative. It is one of the important steps in knowledge discovery. The problem is that all genes are not important in gene expression data. Some of the genes may be redundant, and others may be irrelevant and noisy. Here a novel approach is proposed Hybrid K-Mean-Quick Reduct (KMQR) algorithm for gene selection from gene expres...

متن کامل

High-Dimensional Unsupervised Active Learning Method

In this work, a hierarchical ensemble of projected clustering algorithm for high-dimensional data is proposed. The basic concept of the algorithm is based on the active learning method (ALM) which is a fuzzy learning scheme, inspired by some behavioral features of human brain functionality. High-dimensional unsupervised active learning method (HUALM) is a clustering algorithm which blurs the da...

متن کامل

Constructing Minimal Spanning Tree Based on Rough Set Theory for Gene Selection

Microarray gene dataset often contains high dimensionalities which cause difficulty in clustering and classification. Datasets containing huge number of genes lead to increased complexity and therefore, degradation of dataset handling performance. Often, all the measured features of these high-dimensional datasets are not relevant for understanding the underlying phenomena of interest. Dimensio...

متن کامل

An Efficient and Generic Hybrid Framework for High Dimensional Data Clustering

Clustering in high dimensional space is a difficult problem which is recurrent in many fields of science and engineering, e.g., bioinformatics, image processing, pattern reorganization and data mining. In high dimensional space some of the dimensions are likely to be irrelevant, thus hiding the possible clustering. In very high dimensions it is common for all the objects in a dataset to be near...

متن کامل

Classification of Chronic Kidney Disease Patients via k-important Neighbors in High Dimensional Metabolomics Dataset

Background: Chronic kidney disease (CKD), characterized by progressive loss of renal function, is becoming a growing problem in the general population. New analytical technologies such as “omics”-based approaches, including metabolomics, provide a useful platform for biomarker discovery and improvement of CKD management. In metabolomics studies, not only prediction accuracy is ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2010